{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#用随机梯度下降处理回归"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "本主题将介绍随机梯度下降法（Stochastic Gradient Descent，SGD），我们将用它解决回归问题，后面我们还用它处理分类问题。\n",
    "\n",
    "<!-- TEASER_END -->"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##Getting ready"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "SGD是机器学习中的无名英雄（unsung hero），许多算法的底层都有SGD的身影。之所以受欢迎是因为其简便与快速——处理大量数据时这些都是好事儿。\n",
    "\n",
    "SGD成为许多机器学习算法的核心的另一个原因是它很容易描述过程。在本章的最后，我们对数据作一些变换，然后用模型的损失函数（loss function）拟合数据。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##How to do it..."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "如果SGD适合处理大数据集，我们就用大点儿的数据集来演示："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "1,000,000\n"
     ]
    }
   ],
   "source": [
    "from sklearn import datasets\n",
    "X, y = datasets.make_regression(int(1e6))\n",
    "print(\"{:,}\".format(int(1e6)))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "值得进一步了解数据对象的构成和规模信息。还在我们用的是NumPy数组，所以我们可以获得`nbytes`。Python本身没有获取NumPy数组大小的方法。输出结果与系统有关，你的结果和下面的数据可能不同："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "800,000,000\n"
     ]
    }
   ],
   "source": [
    "print(\"{:,}\".format(X.nbytes))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "我们把字节码`nbytes`转换成MB（megabytes），看着更直观："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "800.0"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "X.nbytes / 1e6"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "因此，每个数据点的字节数就是："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "8.0"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "X.nbytes / (X.shape[0] * X.shape[1])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "这些信息和我们的目标没多大关系，不过了解数据对象的构成和规模信息还是值得的。\n",
    "\n",
    "现在，我们有了数据，就用`SGDRegressor`来拟合："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "SGDRegressor(alpha=0.0001, average=False, epsilon=0.1, eta0=0.01,\n",
       "       fit_intercept=True, l1_ratio=0.15, learning_rate='invscaling',\n",
       "       loss='squared_loss', n_iter=5, penalty='l2', power_t=0.25,\n",
       "       random_state=None, shuffle=True, verbose=0, warm_start=False)"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import numpy as np\n",
    "from sklearn import linear_model\n",
    "sgd = linear_model.SGDRegressor()\n",
    "train = np.random.choice([True, False], size=len(y), p=[.75, .25])\n",
    "sgd.fit(X[train], y[train])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "这里又出现一个“充实的（beefy）”对象。重点需要了解我们的损失函数是`squared_loss`，与线性回归里的残差平方和是一样的。还需要注意`shuffle`会对数据产生随机搅动（shuffle），这在解决伪相关问题时很有用。scikit-learn用`fit_intercept`方法可以自动加一列1。如果想从拟合结果中看到很多输出，就把`verbose`设为1。用scikit-learn的API预测，我们可以统计残差的分布情况："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "linear_preds = sgd.predict(X[~train])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAgQAAAFsCAYAAACzTaE8AAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzt3X2cnWV97/vPN4k8P4QYDQQSghq2jcWNYo371MooGFNq\ngS3Kg8oJ3dl292QfKu3eFagenRxsBFvloVtoT6USUBAsFVAQk6Ij6G6YimApMQLaUBJMooEkiDUk\n8Dt/rDtxZTLJTOYhM4TP+/Var7nX777ua13XzCTru+6nSVUhSZJe3MaM9AAkSdLIMxBIkiQDgSRJ\nMhBIkiQMBJIkCQOBJEnCQCCpF0nel+TrO1nflWTuELxOR5LHB9uPpMEzEEh7gCTLk/wiydNJViW5\nLslBA+2vqr5QVe/YWZPmIWkPYSCQ9gwFvLOqDgT+I3AM8JGRHZKkFxIDgbSHqarVwCLgNQBJ3pTk\nfyd5KskDSY7f0jbJOUl+lGRDkh8neW9b/Z62dm9PsizJuiR/CaRtXWeS69qeT0vyfJIxzfPfS7K0\neY0fJfn9HY09yflJVjRtlyV52xB+ayTthIFA2nMEIMkRwGzg3iSHA18F/t+qOgT4n8DNSV6aZH/g\ncmB2VR0E/Cfgge06TSYCNwN/CrwU+BHwm21N+jp0sBr4neY1fg+4NMnrenmd/wD8d+ANTdtZwPJ+\nzl3SIBkIpD1DgFuSbAD+jdab9p8B7wfuqKo7AarqH4DvAr9D6438eeCYJPtW1eqqWtpL3ycB/1JV\nf19Vz1XVZcCqHq+9Q1V1R1X9a7N8N629F7/VS9PngL2B1yR5SVX9W1X9uL/fAEmDYyCQ9gwFnNJ8\nsu4A3gYcBxwJvKc5XPBUkqdofbo/tKp+AZwB/AHwRJKvNp/Se5oMrOhR6/eVAUl+O8mSJGub1z+J\n1p6GbSdQ9ShwHtAJrE5yQ5LD+vs6kgbHQCDtYZpP4X8JXEJrb8F1VXVI2+PAqvpk03ZRVc0CDgWW\nAX/TS5dPAFO2PEmS9ufAz4H92p4f2tZ2b1qHGz4JvLw5bHEHO9irUFU3VNVv0Qoy1cxB0m5gIJD2\nTJcBbwS+DfxukllJxibZp7n2//AkL09ySnMuwSbgGVq77Xu6g9Zu/P+cZBzwh7S96dM67+AtSaYk\nORi4sG3dXs3jZ8DzSX6b1rkB20lydJK3NSFiI/DLHYxH0jAwEEh7oKr6GbAQ+GPgZFonBK6htcfg\nf9D6hD4G+CNgJbCW1nH9/2tLF81jS1/vAS6m9cb+KlpBY8tr/QNwI/DPwD8BX2nb9mlaAeIm4Eng\nLODWnsNtvu4NfAL4KfATYCLbhgtJwyhVOz9BOMkHgf9K6z+Qv6mqy5NMoPUfwJG0zgI+varWNe0v\nBP4LrWT/h1W1qKkfB1wD7EPrJKcPNvW9gWuB19P6T+mMqnpsaKcpSZJ2Zqd7CJL8Oq0w8Bu0bnby\nziSvBC4AFlfV0cBdzXOSzKB1ktIMWpc9XdkcbwS4CphbVdOB6UlmN/W5wNqmfikeM5Qkabfr65DB\nq4F7q+qXVfUc8C3gNFq7IBc2bRYCpzbLpwA3VNWmqloOPArMbM4UPrCqupt217Zt097XzcAJg5uS\nJEnaVX0Fgn8BfivJhCT70bpc6AhgUnM3NGjddGRSs9zz8qQVwOG91Fc2dZqvjwNU1WZgfXNIQpIk\n7SbjdrayqpYluYTWjUSeoXU28XM92lQS/8iJJEkvYDsNBABV9bfA3wIk+TNan/RXJzm0qlY1hwPW\nNM1Xsu31yUc07Vc2yz3rW7aZSuvGKOOAg6vqyZ7jMHRIktS7qtrpHUP7o8/LDpO8vPk6FXgXcD1w\nGzCnaTIHuKVZvg04M8leSY4CpgPdVbUK2JBkZnOS4dn86tKj9r7eTeskxV5V1R79+NjHPjbiY3CO\nztE5Osc96fFimONQ6XMPAfB3SV5K68Yl86pqfZKLgZuSzKW57LB5w16a5CZgKbC5ab9ltPNoXXa4\nL233VgeuBq5L8gityw7PHJKZSZKkfuvPIYO39FJ7EjhxB+0XAAt6qd9H62+096xvpAkUkiRpZHin\nwlGko6NjpIcw7JzjnsE57hmco9r1eafC0SJJvVDGKknS7pKEGoKTCvtzDoEkaQj96gau0q4Zzg/G\nBgJJGgHu8dSuGu4g6TkEkiTJQCBJkgwEkiQJA4EkaTf4whe+wDve8Y4dru/o6ODqq68e9Ot0dXUx\nZcqUvhvuggMPPJDly5cPaZ+jkScVStIocN55naxbN3z9jx8Pl13W2a+206ZNY82aNYwdO5b999+f\nt7/97XzmM5/hoIMOGvDrv+997+N973vfDtcnGdGrL8455xymTJnCRRddtN26p59+egRGtPsZCCTt\n1HC/UQ3UrrzBvRCsWwfTpnUOW//Ll/e/7yR89atf5W1vexurV6/mHe94Bx//+Mf55Cc/OWzjG2kj\nHUh62rx5M+PG7d63aAOBpJ0a7jeqgdqVNzgN3KRJk5g1axYPPfTQ1tqSJUv44z/+Y37wgx9w5JFH\ncvnll3P88ccDcM0113DRRRfx05/+lIkTJ/Lxj3+c9773vVxzzTVcffXV3HPPPQAsXryYc889l1Wr\nVnH22WdvcxlmZ2cnP/rRj7juuusAWL58Oa94xSvYvHkzY8aM4XOf+xx//ud/zooVK3jZy17G+eef\nz+///u/3Ov5LLrmEv/zLv2TDhg1MnjyZK6+8kre97W29tt3RpaBjxozh0Ucf5RWveAXnnHMO+++/\nP4899hh33303M2bM4Prrr+cVr3gFAMuWLePcc8/le9/7Hi972cu46KKLeM973gPA7bffzkc+8hF+\n/OMfc/DBBzN37lw+9rGPbTPHz372s8yfP5+jjjqKrq6u/v6YhoTnEEiStrPlzXHFihXceeedzJw5\nE4CVK1fyzne+k49+9KM89dRT/MVf/AWnnXYaa9eu5ZlnnuGDH/wgd955Jxs2bOAf//EfOfbYY7fr\n+2c/+xmnnXYaCxYsYO3atbzyla/kO9/5ztb1fX1SnzRpErfffjsbNmzgc5/7HH/0R3/E/fffv127\nH/7wh3zmM5/hu9/9Lhs2bGDRokVMmzZtEN+VlhtvvJHOzk6eeuopXvWqV/HhD38YgGeeeYa3v/3t\nvP/97+enP/0pX/ziF5k3bx4/+MEPADjggAP4/Oc/z/r167n99tu56qqruPXWW7fp++6772bZsmV8\n/etfH/Q4d5WBQJK0jari1FNP5aCDDmLq1Km88pWv5CMf+QgAn//85znppJOYPXs2ACeeeCJveMMb\nuP3220nCmDFjePDBB/n3f/93Jk2axIwZM7br/4477uDXf/3Xede73sXYsWM577zzOPTQQ7d5/Z05\n6aSTOOqoowB4y1vewqxZs7bueWg3duxYNm7cyEMPPcSmTZuYOnXq1k/yA5WEd73rXbzhDW9g7Nix\nvO997+OBBx4A4Ktf/SpHHXUUc+bMYcyYMRx77LG8613v4ktf+hIAxx9/PK95zWsAOOaYYzjzzDP5\n1re+tU3/nZ2d7Lvvvuy9996DGudAGAgkSdtIwq233sqGDRvo6uriG9/4Bt/97ncBeOyxx/jSl77E\nIYccsvXxne98h1WrVrHffvtx44038ld/9VdMnjyZd77znfzwhz/crv8nnniCI444YpvarlwZ8LWv\nfY03velNvPSlL+WQQw7hjjvuYO3atdu1e9WrXsVll11GZ2cnkyZN4qyzzuInP/nJLn43tjdp0qSt\ny/vuuy8///nPgdb35t57793me3P99dezevVqAO69917e+ta38vKXv5zx48fz13/919uNe6ivkNgV\nBgJJ0g695S1v4dxzz+X8888HYOrUqZx99tk89dRTWx9PP/00H/rQhwCYNWsWixYtYtWqVbz61a/m\nAx/4wHZ9Tp48mccff3zr86ra5vkBBxzAL37xi63PV61atXV548aNnHbaaXzoQx9izZo1PPXUU5x0\n0kk73Ktw1llncc899/DYY4+RZOs8ejPYkwqnTp3K8ccfv9335jOf+QwA733vezn11FNZsWIF69at\n4w/+4A94/vnnh3QMg2EgkCTt1HnnnUd3dzf33nsv73//+/nKV77CokWLeO655/jlL39JV1cXK1eu\nZM2aNdx6660888wzvOQlL2H//fdn7Nix2/V30kkn8dBDD/HlL3+ZzZs3c8UVV2zzpn/sscdy9913\n8/jjj7N+/Xo+8YlPbF337LPP8uyzzzJx4kTGjBnD1772NRYtWtTruB9++GG+8Y1vsHHjRvbee2/2\n2WefXscDrVCyefNmfvnLX259bNq0qdd2O/I7v/M7PPzww3z+859n06ZNbNq0iX/6p39i2bJlAPz8\n5z/nkEMOYa+99qK7u5vrr79+VF3ZYCCQJO3UxIkTmTNnDpdccglHHHEEt956KwsWLODlL385U6dO\n5VOf+hRVxfPPP8+ll17K4Ycfzktf+lLuuecerrrqKmDby/omTpzIl770JS644AImTpzIo48+ypvf\n/Oatr3fiiSdyxhln8NrXvpbf+I3f4Hd/93e3bnvggQdyxRVXcPrppzNhwgRuuOEGTjnllG3Gu6Xt\nxo0bufDCC3nZy17GYYcdxs9+9rNtwkXPbS6++GL222+/rY8TTjih13Y938Tbx7Zo0SK++MUvcvjh\nh3PYYYdx4YUX8uyzzwJw5ZVX8tGPfpSDDjqIiy66iDPOOKPXfkZKXih/cStJvVDGKu1Jzjmnc9Re\ndnjNNZ0jPYwBaf5+/Ta10XRjIo1Ovf3etNUHnSa8D4EkjQK+WWukechAkiQZCCRJkoFAkiRhIJAk\nSRgIJEkSBgJJkoSXHUrSiBjpm9BIPRkIJGk38yZrGo36PGSQ5MIkDyV5MMn1SfZOMiHJ4iQPJ1mU\nZHyP9o8kWZZkVlv9uKaPR5Jc3lbfO8mNTX1JkiOHfpqSJGlndhoIkkwDPgC8vqqOAcYCZwIXAIur\n6mjgruY5SWYAZwAzgNnAlfnVfrGrgLlVNR2YnmR2U58LrG3qlwKXDNnsJElSv/S1h2ADsAnYL8k4\nYD/gCeBkYGHTZiFwarN8CnBDVW2qquXAo8DMJIcBB1ZVd9Pu2rZt2vu6Gdj+r0lIkqRhtdNAUFVP\nAp8C/o1WEFhXVYuBSVW1umm2GpjULE8GVrR1sQI4vJf6yqZO8/Xx5vU2A+uTTBjohCRJ0q7r65DB\nK4HzgGm03tQPSPL+9jbNnyD0DBlJkl7A+rrK4A3A/66qtQBJ/h74T8CqJIdW1armcMCapv1KYErb\n9kfQ2jOwslnuWd+yzVTgieawxMHNnontdHZ2bl3u6Oigo6Ojr/lJkrRH6erqoqura8j77SsQLAP+\nnyT7Ar8ETgS6gWeAObROAJwD3NK0vw24PsmnaR0KmA50V1Ul2ZBkZrP92cAVbdvMAZYA76Z1kmKv\n2gOBJEkvRj0/EM+fP39I+t1pIKiq7ye5Fvgu8DzwPeD/Aw4EbkoyF1gOnN60X5rkJmApsBmYV7+6\n4HYecA2wL3BHVd3Z1K8GrkvyCLCW1lUMkiRpN+rzxkRV9Ungkz3KT9LaW9Bb+wXAgl7q9wHH9FLf\nSBMoJEnSyPBvGUiSJAOBJEkyEEiSJAwEkiQJA4EkScJAIEmSMBBIkiT6cR8CScPvvPM6WbdupEfR\nu+7uB5g2baRHIWm4GQikUWDdOpg2rXOkh9Grb3/71L4bSXrB85CBJEkyEEiSJAOBJEnCQCBJkjAQ\nSJIkDASSJAkDgSRJwkAgSZIwEEiSJAwEkiQJA4EkScJAIEmSMBBIkiQMBJIkCQOBJEnCQCBJkjAQ\nSJIkDASSJIl+BIIk/yHJ/W2P9Un+MMmEJIuTPJxkUZLxbdtcmOSRJMuSzGqrH5fkwWbd5W31vZPc\n2NSXJDly6KcqSZJ2pM9AUFU/rKrXVdXrgOOAXwBfBi4AFlfV0cBdzXOSzADOAGYAs4Erk6Tp7ipg\nblVNB6Ynmd3U5wJrm/qlwCVDNUFJktS3XT1kcCLwaFU9DpwMLGzqC4FTm+VTgBuqalNVLQceBWYm\nOQw4sKq6m3bXtm3T3tfNwAm7OhFJkjRwuxoIzgRuaJYnVdXqZnk1MKlZngysaNtmBXB4L/WVTZ3m\n6+MAVbUZWJ9kwi6OTZIkDdC4/jZMshfwu8D5PddVVSWpoRxYbzo7O7cud3R00NHRMdwvKUnSqNLV\n1UVXV9eQ99vvQAD8NnBfVf20eb46yaFVtao5HLCmqa8EprRtdwStPQMrm+We9S3bTAWeSDIOOLiq\nnuw5gPZAIEnSi1HPD8Tz588fkn535ZDBWfzqcAHAbcCcZnkOcEtb/cwkeyU5CpgOdFfVKmBDkpnN\nSYZnA7f20te7aZ2kKEmSdpN+7SFIsj+tEwo/0Fa+GLgpyVxgOXA6QFUtTXITsBTYDMyrqi2HE+YB\n1wD7AndU1Z1N/WrguiSPAGtpnasgSZJ2k34Fgqp6BpjYo/YkrZDQW/sFwIJe6vcBx/RS30gTKCRJ\n0u7nnQolSZKBQJIkGQgkSRIGAkmShIFAkiRhIJAkSRgIJEkSBgJJkoSBQJIkYSCQJEkYCCRJEgYC\nSZKEgUCSJGEgkCRJGAgkSRIGAkmShIFAkiRhIJAkSRgIJEkSBgJJkoSBQJIkYSCQJEkYCCRJEgYC\nSZKEgUCSJGEgkCRJ9DMQJBmf5O+S/CDJ0iQzk0xIsjjJw0kWJRnf1v7CJI8kWZZkVlv9uCQPNusu\nb6vvneTGpr4kyZFDO01JkrQz/d1DcDlwR1X9GvBaYBlwAbC4qo4G7mqek2QGcAYwA5gNXJkkTT9X\nAXOrajowPcnspj4XWNvULwUuGfTMJElSv/UZCJIcDPxWVf0tQFVtrqr1wMnAwqbZQuDUZvkU4Iaq\n2lRVy4FHgZlJDgMOrKrupt21bdu093UzcMKgZiVJknZJf/YQHAX8NMnnknwvyd8k2R+YVFWrmzar\ngUnN8mRgRdv2K4DDe6mvbOo0Xx+HVuAA1ieZMJAJSZKkXdefQDAOeD1wZVW9HniG5vDAFlVVQA39\n8CRJ0u4wrh9tVgArquqfmud/B1wIrEpyaFWtag4HrGnWrwSmtG1/RNPHyma5Z33LNlOBJ5KMAw6u\nqid7DqSzs3PrckdHBx0dHf0YviRJe46uri66urqGvN8+A0Hzhv94kqOr6mHgROCh5jGH1gmAc4Bb\nmk1uA65P8mlahwKmA91VVUk2JJkJdANnA1e0bTMHWAK8m9ZJittpDwSSJL0Y9fxAPH/+/CHptz97\nCADOBb6QZC/gR8DvAWOBm5LMBZYDpwNU1dIkNwFLgc3AvOaQAsA84BpgX1pXLdzZ1K8GrkvyCLAW\nOHOQ85IkSbugX4Ggqr4P/EYvq07cQfsFwIJe6vcBx/RS30gTKCRJ0u7nnQolSZKBQJIkGQgkSRIG\nAkmShIFAkiRhIJAkSRgIJEkSBgJJkoSBQJIkYSCQJEkYCCRJEgYCSZKEgUCSJGEgkCRJGAgkSRIG\nAkmShIFAkiRhIJAkSRgIJEkSBgJJkoSBQJIkYSCQJEkYCCRJEgYCSZKEgUCSJGEgkCRJGAgkSRL9\nDARJlif55yT3J+luahOSLE7ycJJFSca3tb8wySNJliWZ1VY/LsmDzbrL2+p7J7mxqS9JcuRQTlKS\nJO1cf/cQFNBRVa+rqjc2tQuAxVV1NHBX85wkM4AzgBnAbODKJGm2uQqYW1XTgelJZjf1ucDapn4p\ncMkg5yVJknbBrhwySI/nJwMLm+WFwKnN8inADVW1qaqWA48CM5McBhxYVd1Nu2vbtmnv62bghF0Y\nlyRJGqRd2UPwD0m+m+QDTW1SVa1ullcDk5rlycCKtm1XAIf3Ul/Z1Gm+Pg5QVZuB9Ukm7MpEJEnS\nwI3rZ7vfrKqfJHkZsDjJsvaVVVVJauiHJ0mSdod+BYKq+knz9adJvgy8EVid5NCqWtUcDljTNF8J\nTGnb/AhaewZWNss961u2mQo8kWQccHBVPdlzHJ2dnVuXOzo66Ojo6M/wJUnaY3R1ddHV1TXk/fYZ\nCJLsB4ytqqeT7A/MAuYDtwFzaJ0AOAe4pdnkNuD6JJ+mdShgOtDd7EXYkGQm0A2cDVzRts0cYAnw\nblonKW6nPRBIkvRi1PMD8fz584ek3/7sIZgEfLm5UGAc8IWqWpTku8BNSeYCy4HTAapqaZKbgKXA\nZmBeVW05nDAPuAbYF7ijqu5s6lcD1yV5BFgLnDkEc5MkSf3UZyCoqn8Fju2l/iRw4g62WQAs6KV+\nH3BML/WNNIFCkiTtft6pUJIkGQgkSZKBQJIkYSCQJEkYCCRJEgYCSZKEgUCSJGEgkCRJGAgkSRIG\nAkmShIFAkiRhIJAkSRgIJEkSBgJJkoSBQJIkYSCQJEkYCCRJEgYCSZKEgUCSJGEgkCRJGAgkSRIG\nAkmShIFAkiRhIJAkScC4kR6AJA1Ed/cSzjmnc6SH0avx4+GyyzpHehjSLjEQSHpBevbZfZg2rXOk\nh9Gr5cs7R3oI0i7r1yGDJGOT3J/kK83zCUkWJ3k4yaIk49vaXpjkkSTLksxqqx+X5MFm3eVt9b2T\n3NjUlyQ5cignKEmS+tbfcwg+CCwFqnl+AbC4qo4G7mqek2QGcAYwA5gNXJkkzTZXAXOrajowPcns\npj4XWNvULwUuGdyUJEnSruozECQ5AjgJ+Cyw5c39ZGBhs7wQOLVZPgW4oao2VdVy4FFgZpLDgAOr\nqrtpd23bNu193QycMODZSJKkAenPHoJLgT8Bnm+rTaqq1c3yamBSszwZWNHWbgVweC/1lU2d5uvj\nAFW1GVifZMIuzEGSJA3STgNBkncCa6rqfn61d2AbVVX86lCCJEl6AerrKoP/Azg5yUnAPsBBSa4D\nVic5tKpWNYcD1jTtVwJT2rY/gtaegZXNcs/6lm2mAk8kGQccXFVP9jaYzs7OrcsdHR10dHT0OUFJ\nkvYkXV1ddHV1DXm/Ow0EVfWnwJ8CJDke+J9VdXaSTwJzaJ0AOAe4pdnkNuD6JJ+mdShgOtBdVZVk\nQ5KZQDdwNnBF2zZzgCXAu2mdpNir9kAgSdKLUc8PxPPnzx+Sfnf1PgRbDg1cDNyUZC6wHDgdoKqW\nJrmJ1hUJm4F5zSEFgHnANcC+wB1VdWdTvxq4LskjwFrgzIFNRZIkDVS/A0FVfQv4VrP8JHDiDtot\nABb0Ur8POKaX+kaaQCFJkkaGf8tAkiQZCCRJkoFAkiRhIJAkSRgIJEkSBgJJkoSBQJIkYSCQJEkY\nCCRJEgYCSZKEgUCSJGEgkCRJGAgkSRIGAkmShIFAkiRhIJAkSRgIJEkSBgJJkoSBQJIkYSCQJEkY\nCCRJEgYCSZKEgUCSJGEgkCRJGAgkSRIGAkmShIFAkiTRRyBIsk+Se5M8kGRpkk809QlJFid5OMmi\nJOPbtrkwySNJliWZ1VY/LsmDzbrL2+p7J7mxqS9JcuRwTFSSJO3YTgNBVf0SeGtVHQu8FnhrkjcD\nFwCLq+po4K7mOUlmAGcAM4DZwJVJ0nR3FTC3qqYD05PMbupzgbVN/VLgkqGcoCRJ6lufhwyq6hfN\n4l7AWOAp4GRgYVNfCJzaLJ8C3FBVm6pqOfAoMDPJYcCBVdXdtLu2bZv2vm4GThjwbCRJ0oD0GQiS\njEnyALAa+GZVPQRMqqrVTZPVwKRmeTKwom3zFcDhvdRXNnWar48DVNVmYH2SCQObjiRJGohxfTWo\nqueBY5McDHw9yVt7rK8kNVwDlCRJw6/PQLBFVa1PcjtwHLA6yaFVtao5HLCmabYSmNK22RG09gys\nbJZ71rdsMxV4Isk44OCqerK3MXR2dm5d7ujooKOjo7/DlzjvvE7WrRvpUfSuu/sBpk0b6VFIeiHo\n6uqiq6tryPvdaSBIMhHYXFXrkuwLvB2YD9wGzKF1AuAc4JZmk9uA65N8mtahgOlAd7MXYUOSmUA3\ncDZwRds2c4AlwLtpnaTYq/ZAIO2qdetg2rTOkR5Gr7797VP7biRJbP+BeP78+UPSb197CA4DFiYZ\nQ+t8g+uq6q4k9wM3JZkLLAdOB6iqpUluApYCm4F5VbXlcMI84BpgX+COqrqzqV8NXJfkEWAtcOaQ\nzEySJPXbTgNBVT0IvL6X+pPAiTvYZgGwoJf6fcAxvdQ30gQKSZI0MrxToSRJMhBIkiQDgSRJwkAg\nSZIwEEiSJAwEkiQJA4EkScJAIEmSMBBIkiQMBJIkCQOBJEnCQCBJkjAQSJIkDASSJAkDgSRJwkAg\nSZIwEEiSJAwEkiQJA4EkScJAIEmSMBBIkiQMBJIkCQOBJEnCQCBJkjAQSJIkDASSJAkDgSRJoh+B\nIMmUJN9M8lCSf0nyh019QpLFSR5OsijJ+LZtLkzySJJlSWa11Y9L8mCz7vK2+t5JbmzqS5IcOdQT\nlSRJO9afPQSbgD+qqtcAbwL+e5JfAy4AFlfV0cBdzXOSzADOAGYAs4Erk6Tp6ypgblVNB6Ynmd3U\n5wJrm/qlwCVDMjtJktQvfQaCqlpVVQ80yz8HfgAcDpwMLGyaLQRObZZPAW6oqk1VtRx4FJiZ5DDg\nwKrqbtpd27ZNe183AycMZlKSJGnX7NI5BEmmAa8D7gUmVdXqZtVqYFKzPBlY0bbZCloBomd9ZVOn\n+fo4QFVtBtYnmbArY5MkSQM3rr8NkxxA69P7B6vq6V8dBYCqqiQ1DOPbRmdn59bljo4OOjo6hvsl\nJUkaVbq6uujq6hryfvsVCJK8hFYYuK6qbmnKq5McWlWrmsMBa5r6SmBK2+ZH0NozsLJZ7lnfss1U\n4Ikk44CDq+rJnuNoDwSSJL0Y9fxAPH/+/CHptz9XGQS4GlhaVZe1rboNmNMszwFuaaufmWSvJEcB\n04HuqloFbEgys+nzbODWXvp6N62TFCVJ0m7Snz0Evwm8H/jnJPc3tQuBi4GbkswFlgOnA1TV0iQ3\nAUuBzcC8qtpyOGEecA2wL3BHVd3Z1K8GrkvyCLAWOHOQ85IkSbugz0BQVd9mx3sSTtzBNguABb3U\n7wOO6aW+kSZQSJKk3c87FUqSJAOBJEkyEEiSJAwEkiQJA4EkScJAIEmSMBBIkiQMBJIkCQOBJEnC\nQCBJkjCb/n6IAAAJOUlEQVQQSJIkDASSJAkDgSRJwkAgSZIwEEiSJAwEkiQJA4EkScJAIEmSMBBI\nkiQMBJIkCQOBJEnCQCBJkjAQSJIkDASSJAkDgSRJwkAgSZLoRyBI8rdJVid5sK02IcniJA8nWZRk\nfNu6C5M8kmRZkllt9eOSPNisu7ytvneSG5v6kiRHDuUEJUlS3/qzh+BzwOwetQuAxVV1NHBX85wk\nM4AzgBnNNlcmSbPNVcDcqpoOTE+ypc+5wNqmfilwySDmI0mSBqDPQFBV9wBP9SifDCxslhcCpzbL\npwA3VNWmqloOPArMTHIYcGBVdTftrm3bpr2vm4ETBjAPSZI0CAM9h2BSVa1ullcDk5rlycCKtnYr\ngMN7qa9s6jRfHweoqs3A+iQTBjguSZI0AIM+qbCqCqghGIskSRoh4wa43eokh1bVquZwwJqmvhKY\n0tbuCFp7BlY2yz3rW7aZCjyRZBxwcFU92duLdnZ2bl3u6Oigo6NjgMOXJOmFqauri66uriHvd6CB\n4DZgDq0TAOcAt7TVr0/yaVqHAqYD3VVVSTYkmQl0A2cDV/ToawnwblonKfaqPRBIkvRi1PMD8fz5\n84ek3z4DQZIbgOOBiUkeBz4KXAzclGQusBw4HaCqlia5CVgKbAbmNYcUAOYB1wD7AndU1Z1N/Wrg\nuiSPAGuBM4dkZpIkqd/6DARVddYOVp24g/YLgAW91O8DjumlvpEmUEjSnqC7ewnnnNM50sPYzvjx\ncNllnSM9DI1SAz1kIEnagWef3Ydp0zpHehjbWb68c6SHoFHMWxdLkiQDgSRJMhBIkiQMBJIkCQOB\nJEnCQCBJkjAQSJIkDASSJAkDgSRJwkAgSZIwEEiSJAwEkiQJA4EkScK/dqhhcN55naxbN9Kj2F53\n9wNMmzbSo5Ck0clAoCG3bh2j8k+/fvvbp470ECRp1PKQgSRJMhBIkiQDgSRJwkAgSZIwEEiSJAwE\nkiQJA4EkScJAIEmSMBBIkiQMBJIkCQOBJEliFAWCJLOTLEvySJLzR3o8kiS9mIyKQJBkLPC/gNnA\nDOCsJL82sqPa/bq6ukZ6CMNu+fKukR7CsHOOe4YXwxxfDP/nvBjmOFRGy187fCPwaFUtB0jyReAU\n4AcjOajdrauri46OjpEexrBavryLadM6RnoYw8o57hn2xDl2dy/hnHM6tz5/4IEujj22Y8TG0278\neLjsss4h7/fF8P/qUBktgeBw4PG25yuAmSM0lheE887rZN26kR5F77q7H2DatJEehaSenn12n23+\nNPny5Z2j5k+VL1/eOdJDeNEbLYGgRnoAPVUVn/rUp1m//und9prf+lYXzz3X2a+2Dz+8hje+8crh\nHdAAffvbp470ECRJuyhVI/9enORNQGdVzW6eXwg8X1WXtLUZ+YFKkjQKVVUG28doCQTjgB8CJwBP\nAN3AWVX1ojqHQJKkkTIqDhlU1eYk/zfwdWAscLVhQJKk3WdU7CGQJEkja1TchwAgyYQki5M8nGRR\nkvE7aLfTGxgl+R9Jnk8yYfhHvesGO88kFyX5fpIHktyVZMruG33/DMEc/zzJD5p5/n2Sg3ff6Ptn\nCOb4niQPJXkuyet338j71p+bhCW5oln//SSv25VtR4NBzvFvk6xO8uDuG/GuG+gck0xJ8s3m9/Nf\nkvzh7h15/w1ijvskubf5f3Rpkk/s3pH332B+V5t1Y5Pcn+Qrfb5YVY2KB/BJ4EPN8vnAxb20GQs8\nCkwDXgI8APxa2/opwJ3AvwITRnpOwzFP4MC2ducCnx3pOQ3DHN8OjGmWL+5t+5F+DMEcXw0cDXwT\neP1Iz6c/Y25rcxJwR7M8E1jS321Hw2Mwc2ye/xbwOuDBkZ7LMP0cDwWObZYPoHV+1574c9yv+ToO\nWAK8eaTnNNRzbGp/DHwBuK2v1xs1ewiAk4GFzfJCoLdr17bewKiqNgFbbmC0xaeBDw3rKAdvUPOs\nqvbrIA8AfjaMYx2owc5xcVU937S7FzhimMc7EIOd47Kqeni3jHTX9PVvDNrmXlX3AuOTHNrPbUeD\nwcyRqroHeGo3jncgBjrHSVW1qqoeaOo/p3WDuMm7b+j9NuA5Ns9/0bTZi9Yb75O7ZdS7ZlBzTHIE\nrcDwWaDPqxBGUyCYVFWrm+XVwKRe2vR2A6PDAZKcAqyoqn8e1lEO3qDmCZDkz5L8GzCH1ifo0WbQ\nc2zzX4A7hnZ4Q2Io5zia9GfMO2ozuR/bjgaDmeMLxUDnuE34TjKN1t6Qe4d8hIM3qDk2u9IfoPXv\n95tVtXQYxzpQg/1dvRT4E+B5+mG3XmWQZDGt3VE9fbj9SVVVer/vQK9nQCbZF/hTWruat5YHOs7B\nGq55tm33YeDDSS6g9QP/vYGOdaCGe47Na3wYeLaqrh/YKAdnd8xxFOrvmEfs39cQGOgcX0g/z0HP\nMckBwN8BH2z2FIw2g5pjVT0HHNuco/T1JB1V1TWE4xsKA51jkrwTWFNV9yfp6E8nuzUQVNXbd7Su\nOUnn0KpaleQwYE0vzVbSOk9giym00tAraR1j+X4SaCXA+5K8sap662dYDeM8e7qeEfr0PNxzTHIO\nrV1dJwzNiHfdbvw5jib9GXPPNkc0bV7Sj21Hg4HOceUwj2soDWqOSV4C3Ax8vqpuGcZxDsaQ/Byr\nan2S24E3AF1DP8xBGcwcTwNOTnISsA9wUJJrq+r/3OGrjfRJE20nPnwSOL9ZvoDeT9IaB/yI1pv/\nXuzgpCVG/0mFA54nML2t3bnAdSM9p2GY42zgIWDiSM9luObY1uabwHEjPZ9dHHP7SUxv4lcno/Xr\n3+dIPwYzx7b10xjdJxUO5ucY4Frg0pGexzDOcSIwvlneF7gbOGGk5zSUc+zR5njgK32+3khPuG3A\nE4B/AB4GFrX9sCYDt7e1+21aZ70+Cly4g75+zOgNBIOaJ61deA82vxg3Ay8f6TkNwxwfAR4D7m8e\nV470nIZhjv+Z1nG/fwdWAV8b6TntbMzAfwP+W1ub/9Ws/z5tV0n059/naHgMco430Lqj6sbmZ/h7\nIz2foZwj8GZax5wfaPs3OHuk5zPEczwG+F4zx38G/mSk5zIcv6tt64+nH1cZeGMiSZI0qq4ykCRJ\nI8RAIEmSDASSJMlAIEmSMBBIkiQMBJIkCQOBJEnCQCBJkoD/H1gJgfFc52KhAAAAAElFTkSuQmCC\n",
      "text/plain": [
       "<matplotlib.figure.Figure at 0x7fde95b819b0>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "%matplotlib inline\n",
    "from matplotlib import pyplot as plt\n",
    "f, ax = plt.subplots(figsize=(7, 5))\n",
    "f.tight_layout()\n",
    "ax.hist(linear_preds - y[~train],label='Residuals Linear', color='b', alpha=.5);\n",
    "ax.set_title(\"Residuals\")\n",
    "ax.legend(loc='best');"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "拟合的效果非常好。异常值很少，直方图也呈现出完美的正态分布钟形图。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##How it works..."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "当然这里我们用的是虚拟数据集，但是你也可以用更大的数据集合。例如，如果你在华尔街工作，有可能每天一个市场都有20亿条交易数据。现在如果有一周或一年的数据，用SGD算法就可能无法运行了。很难处理这么大的数据量，因为标准的梯度下降法每一步都要计算梯度，计算量非常庞大。\n",
    "\n",
    "标准的梯度下降法的思想是在每次迭代计算一个新的相关系数矩阵，然后用学习速率（learning rate）和目标函数（objective function）的梯度调整它，直到相关系数矩阵收敛为止。如果用伪代码写就是这样：\n",
    "\n",
    "```\n",
    "while not_converged:\n",
    "    w = w – learning_rate * gradient(cost(w))\n",
    "```\n",
    "\n",
    "这里涉及的变量包括：\n",
    "\n",
    "- `w`：相关系数矩阵\n",
    "- `learning_rate`：每次迭代时前进的长度。如果收敛效果不好，调整这个参数很重要\n",
    "- `gradient`：导数矩阵\n",
    "- `cost`：回归的残差平方和。后面我们会介绍，不同的分类方法中损失函数定义不同，具有可变性也是SGD应用广泛的理由之一。\n",
    "\n",
    "除了梯度函数有点复杂之外，这个方法还是可以的。随着相关系数向量的增加，梯度的计算也会变得越来越慢。每次更新之前，我们都需要对每个数据点计算新权重。\n",
    "\n",
    "SGD的工作方式稍有不同；每次迭代不是批量更新梯度，而是只更新新数据点的参数。这些数据点是随机选择的，因此称为随机梯度下降法。"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.5.1"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
}
